feat!: support v0.11.1#112
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the codebase to support vllm v0.11.1, which involves significant refactoring around memory allocation, platform integration, and attention mechanisms. The changes appear to align with the goal of supporting the new vllm version. I have found one critical issue in the device allocator patch that could lead to a runtime error and have provided a fix.
| if len(self._sleep_saved_buffers): | ||
| model = self.model_runner.model | ||
| for name, buffer in model.named_buffers(): | ||
| if name in self._sleep_saved_buffers: | ||
| buffer.data.copy_(self._sleep_saved_buffers[name].data) | ||
| self._sleep_saved_buffers = {} |
There was a problem hiding this comment.
There is a potential AttributeError here. The self._sleep_saved_buffers attribute is only initialized within the sleep method, and only when level == 2. If wake_up is called after sleep(level=1) or before any call to sleep, self._sleep_saved_buffers will not exist on the object, causing a crash when len() is called on it.
To prevent this, you should safely check for the attribute's existence before trying to access it.
| if len(self._sleep_saved_buffers): | |
| model = self.model_runner.model | |
| for name, buffer in model.named_buffers(): | |
| if name in self._sleep_saved_buffers: | |
| buffer.data.copy_(self._sleep_saved_buffers[name].data) | |
| self._sleep_saved_buffers = {} | |
| if hasattr(self, "_sleep_saved_buffers") and self._sleep_saved_buffers: | |
| model = self.model_runner.model | |
| for name, buffer in model.named_buffers(): | |
| if name in self._sleep_saved_buffers: | |
| buffer.data.copy_(self._sleep_saved_buffers[name].data) | |
| self._sleep_saved_buffers = {} |
a601543 to
1f48880
Compare
ab31312 to
f516af8
Compare
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Xin Li <lixin1620@gmail.com>
Signed-off-by: leex404 <lixin1620@gmail.com>
Signed-off-by: leex404 <lixin1620@gmail.com>
…l` (#115) * [fix] fix sample_recovered_tokens_kernel use too much private memory Signed-off-by: Xin Li <xin.li@metax-tech.com> * [fix] fix type error in bf16_paged_mqa_logits Signed-off-by: Xin Li <xin.li@metax-tech.com> * [chore] change file directory Signed-off-by: Xin Li <xin.li@metax-tech.com> --------- Signed-off-by: Xin Li <xin.li@metax-tech.com> Co-authored-by: Xin Li <xin.li@metax-tech.com> Signed-off-by: leex404 <lixin1620@gmail.com>
Signed-off-by: leex404 <lixin1620@gmail.com>
Signed-off-by: leex404 <lixin1620@gmail.com>
Signed-off-by: leex404 <lixin1620@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
de238f9 to
32d2d83
Compare
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: leex404 <lixin1620@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
related: vllm-project/vllm/pull/27322 Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: leex404 <lixin1620@gmail.com>
Signed-off-by: leex404 <lixin1620@gmail.com>
Signed-off-by: leex404 <lixin1620@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: leex404 <lixin1620@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: leex404 <lixin1620@gmail.com>
Signed-off-by: leex404 <lixin1620@gmail.com>
Purpose
This PR is for supporting vllm v0.11.1
Test Plan
Test Result
(Optional) Documentation Update
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.